摘要 :
As caches become larger and shared by an increasing number of cores, cache management is becoming more important. This paper explores collaborative caching, which uses software hints to influence hardware caching. Recent studies h...
展开
As caches become larger and shared by an increasing number of cores, cache management is becoming more important. This paper explores collaborative caching, which uses software hints to influence hardware caching. Recent studies have shown that such collaboration between software and hardware can theoretically achieve optimal cache replacement on LRU-like cache. This paper presents Pacman, a practical solution for collaborative caching in loop-based code. Pacman uses profiling to analyze patterns in an optimal caching policy in order to determine which data to cache and at what time. It then splits each loop into different parts at compile time. At run time, the loop boundary is adjusted to selectively store data that would be stored in an optimal policy. In this way, Pacman emulates the optimal policy wherever it can. Pacman requires a single bit at the load and store instructions. Some of the current hardware has partial support. This paper presents results using both simulated and real systems, and compares simulated results to related caching policies.
收起
摘要 :
Cache-content-duplication (CCD) occurs when there is a miss for a block in a cache and the entire content of the missed block is already in the cache in a block with a different tag. Caches aware of content-duplication can have lo...
展开
Cache-content-duplication (CCD) occurs when there is a miss for a block in a cache and the entire content of the missed block is already in the cache in a block with a different tag. Caches aware of content-duplication can have lower miss penalty by fetching, on a miss to a duplicate block, directly from the cache instead of accessing lower in the memory hierarchy, and can have lower miss rates by allowing only blocks with unique content to enter a cache. This work examines the potential of CCD for instruction caches. We show that CCD is a frequent phenomenon and that an idealized duplication-detection mechanism for instruction caches has the potential to increase performance of an out-of-order processor, with a 16KB, 8-way, 8 instructions per block instruction cache, often by more than 10% and up to 36%. This work also proposes CATCH, a hardware mechanism for dynamically detecting CCD for instruction caches. Experimental results for an out-of-order processor show that a duplication-detection mechanism with a 1.38KB cost captures on average 58% of the CCD's idealized potential.
收起
摘要 :
In the mobile computing environment, how to make the data access more efficient is a challenge due to the narrow communication bandwidth, the frequent disconnections of network, and the limited resources. Therefore, it is necessar...
展开
In the mobile computing environment, how to make the data access more efficient is a challenge due to the narrow communication bandwidth, the frequent disconnections of network, and the limited resources. Therefore, it is necessary to cache data on the client side. Besides, a good cache consistency method is essential to ensure the correctness. In this article, a row-based semantic cache with incremental versioning consistency (RSCVC) is proposed. In RSCVC, we designed a semantic cache algorithm, a query trimming and optimizing algorithm, and a version-based consistency strategy. This RSCVC cache mainly has two advantages. On one hand, it can obviously improve the response time of query and the hit ratio of the cache. On the other hand, the version-based consistency enhances the stability of the system especially in high-concurrency situations. Experiments demonstrate the efficacy of our proposed method and its superiority to state-of-the-art methods.
收起
摘要 :
This article exposes and proves some mathematical facts about optimal cache replacement that were previously unknown or not proved rigorously. An explicit formula is obtained, giving OPT hits and misses as a function of past refer...
展开
This article exposes and proves some mathematical facts about optimal cache replacement that were previously unknown or not proved rigorously. An explicit formula is obtained, giving OPT hits and misses as a function of past references. Several mathematical facts are derived from this formula, including a proof that OPT miss curves are always convex, and a new algorithm called OPT tokens, for reasoning about optimal replacement.
收起
摘要 :
Cache vulnerability due to soft errors is one of the reliability concerns in embedded systems. Dynamic reconfiguration techniques are widely studied for improving cache energy without considering the implications of cache vulnerab...
展开
Cache vulnerability due to soft errors is one of the reliability concerns in embedded systems. Dynamic reconfiguration techniques are widely studied for improving cache energy without considering the implications of cache vulnerability. Maintaining a useful data longer in the cache can be beneficial for energy improvement due to reduction in miss rates, however, longer data retention negatively impacts the vulnerability due to soft errors. This paper studies the tradeoff between energy efficiency improvement and reduction in cache vulnerability during cache reconfiguration. We propose a heuristic approach for both intertask and intratask cache reconfiguration in multitasking systems. Experimental results demonstrate that our proposed approach can significantly improve both vulnerability (25% on average) and energy efficiency (21% on average) for data cache without violating real-time constraints.
收起
摘要 :
Compromised by the ever increasing amount of mobile data traffic generated by mobile devices, edge servers are expected to provide caching resources for users so as to enable low-latency content delivery and release the burden on ...
展开
Compromised by the ever increasing amount of mobile data traffic generated by mobile devices, edge servers are expected to provide caching resources for users so as to enable low-latency content delivery and release the burden on backhaul network. Considering the limited caching capacities of edge servers, it is essential to design an efficient content placement strategy to increase the data offloading ratio, i.e., the proportion of requested contents that are delivered via edge caching rather than backhaul links. In this paper, we propose a spatially cooperative caching strategy for a two-tier heterogeneous network consisting of edge servers and caching helpers aiming at reducing the storage space taken by duplicate contents on caching helpers and increasing hit probability. We develop an analysis framework for the proposed spatially cooperative caching strategy by using tools from stochastic geometry and maximize the hit probability by optimizing the caching probabilities of contents on edge servers and caching helpers. Particular, we derive the closed-form solution of the optimization problem for unit cache size. The simulation results are identical with the analytical ones and the outperformance is observed with the optimal caching probability derived in this paper compared with the existing caching strategies.
收起
摘要 :
In an embedded system where a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. We can have an optimal cache config...
展开
In an embedded system where a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. We can have an optimal cache configuration which minimizes overall memory access time by varying the three cache parameters: the number of sets, a line size, and an associativity. In this paper, we first propose two cache simulation algorithms: CRCB1 and CRCB2, based on Cache Inclusion Property. They realize exact cache simulation but decrease the number of cache hit/miss judgments dramatically. We further propose three more cache design space exploration algorithms: CRMF1, CRMF2, and CRMF3, based on our experimental observations. They can find an almost optimal cache configuration from the viewpoint of access time. By using our approach, the number of cache hit/miss judgments required for optimizing cache configurations is reduced to 1/10-1/50 compared to conventional approaches. As a result, our proposed approach totally runs an average of 3.2 times faster and a maximum of 5.3 times faster compared to the fastest approach proposed so far. Our proposed cache simulation approach achieves the world fastest cache design space exploration when optimizing total memory access time.
收起
摘要 :
With the development of smart mobile devices and various mobile applications, content-oriented service has become the most popular service which occupies network resources and results in high traffic load. In order to improve qual...
展开
With the development of smart mobile devices and various mobile applications, content-oriented service has become the most popular service which occupies network resources and results in high traffic load. In order to improve quality of experience in radio access network and reduce the OpEx and CapEx of operators, wireless network virtualization and network slicing come into the vision and are deemed as promising solutions to radio access networks to provide tailored services. Therefore, network slicing and optimization based on content-oriented service become a challenging research direction. In this paper, network slicing and resource optimization on content-oriented application in cache-enabled hybrid radio access network based on complex network are investigated. A Cooperative Network Slicing Framework Based on Content in RAN (CNSC-RAN) is presented. Based on CNSC-RAN, procedures of content-oriented static network slicing and dynamic slicing are proposed. Content-oriented slicing is modeled and analyzed which includes slicing on content cache resources and communication resources. In order to obtain the optimized resources sliced for each content, the optimization problem is formulated to minimize the average system cost to get the contents required by users. The problem is solved by a heuristic algorithm called CCSOA (Content-Centric Slicing Optimization Algorithm) in a dynamic content-oriented network slicing procedure enabling UEs with self-evicting contents. The performance of CCSOA is evaluated by performance metrics including hit rate, average cache occupation, average system cost, and request traffic reduction to macro cell base station comparing with CEE and ProbCache. Simulation results reveal the effectiveness of CCSOA.
收起
摘要 :
The cache memory has a direct effect on the performance of a computer system. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. Reducing the cache miss ratio will definitely i...
展开
The cache memory has a direct effect on the performance of a computer system. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. Reducing the cache miss ratio will definitely improve the execution time of an application. In this work, we propose cache memory designs that reduce the number of conflict misses significantly. The proposed designs reduce the conflict misses in the last level multi-way set associative cache. Each set is divided into a group of subsets: the first is referred to as the exclusive subset, and the rest are the shared subsets. The exclusive is configured as a traditional cache where each block is mapped to the set whose index matches the block index. In addition to their standard cache indexing role, the shared subsets are configured to host blocks with different indices. A memory block can be mapped to one subset from the exclusive type or one of multiple subsets from the shared type. Since the proposed technique is based on combining multiple sets of the shared part to form a larger set, that is shared between memory blocks with different indices, we have chosen the name "set folding." The decision as to where to map a memory block depends on the number of misses encountered at each of the potential target sets. To evaluate the proposed design based on the overall hit rate, twenty-three benchmarks from SPEC CPU 2006 were simulated using the SuperESCalar simulator. The proposed designs require a few extra storage bits which adds a small overhead on the hardware complexity in comparison with the conventional cache. However, the proposed designs achieve lower miss rates for most of the benchmarks.
收起
摘要 :
The gap between processor and main memory performance increases every year. In order to overcome this problem, cache memories are widely used. However, they are only effective when programs exhibit sufficient data locality. Compil...
展开
The gap between processor and main memory performance increases every year. In order to overcome this problem, cache memories are widely used. However, they are only effective when programs exhibit sufficient data locality. Compile-time program transformations can significantly improve the performance of the cache. To apply most of these transformations, the compiler requires a precise knowledge of the locality of the different sections of the code, both before and after being transformed. Cache miss equations (CMEs) allow us to obtain an analytical and precise description of the cache memory behavior for loop-oriented codes. Unfortunately, a direct solution of the CMEs is computationally intractable due to its NP-complete nature. This article proposes a fast and accurate approach to estimate the solution of the CMEs. We use sampling techniques to approximate the absolute miss ratio of each reference by analyzing a small subset of the iteration space. The size of the subset, and therefore the analysis time, is determined by the accuracy selected by the user. In order to reduce the complexity of the algorithm to solve CMEs, effective mathematical techniques have been developed to analyze the subset of the iteration space that is being considered. These techniques exploit some properties of the particular polyhedra represented by CMEs.
收起